Training Robust Acoustic Models Using Features of Pseudo-Speakers Generated by Inverse CMLLR Transformations

نویسندگان

Arata Itoh

Sunao Hara

Norihide Kitaoka

Kazuya Takeda

چکیده

In this paper a novel speech feature generationbased acoustic model training method is proposed. For decades, speaker adaptation methods have been widely used. All existing adaptation methods need adaptation data. However, our proposed method creates speaker-independent acoustic models that cover not only known but also unknown speakers. We do this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers are estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudospeakers’ features. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models which are created are robust for unknown speakers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition

A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve th...

متن کامل

Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams

Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard or made public. Due to the profound differences between whispered and neutral speech in vocal excitation and vocal tract function, the performance of automatic speaker identi...

متن کامل

Emotion recognition using linear transformations in combination with video

The paper discuses the usage of linear transformations of Hidden Markov Models, normally employed for speaker and environment adaptation, as a way of extracting the emotional components from the speech. A constrained version of Maximum Likelihood Linear Regression (CMLLR) transformation is used as a feature for classification of normal or aroused emotional state. We present a procedure of incre...

متن کامل

Acoustic Model Identification Using Inverse Model

Sound measured at various points around the environment can be evaluated by a series of multi-pole sources and their acoustic strength can be acquired. In this numerical study, a method, called the inverse method, was examined to achieve this goal. A variety of arrangements of different sources were considered and the acoustic strength of these sources was acquired. Through the application of t...

متن کامل

Rapid unsupervised speaker adaptation robust in reverberant environment conditions

We expand the conventional rapid adaptation based on Nclosest speakers sufficient statistics (suff stat) to achieve robustness under reverberant conditions. We integrated our fast dereverberation technique based on optimized multi-band spectral subtraction as pre-processing. This removes the late reflection components of the reverberant signal effectively and fast. Speakers’ suff stat are then ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Training Robust Acoustic Models Using Features of Pseudo-Speakers Generated by Inverse CMLLR Transformations

نویسندگان

چکیده

منابع مشابه

Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition

Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams

Emotion recognition using linear transformations in combination with video

Acoustic Model Identification Using Inverse Model

Rapid unsupervised speaker adaptation robust in reverberant environment conditions

عنوان ژورنال:

اشتراک گذاری